An important part of using Python in the real world is to be able to examine the files you have on your computer!
Different operating systems (Windows, Mac, etc.) have different ways of organizing the files, so you will have to get familiar with the way things are organized on your own computer. The Python os
module helps with that somewhat, so that we can use the same methods on different computers for doing what we want to do.
In [1]:
import os
print( os.listdir("/") )
The files in our system are on one or more drives; on the drive the files are organized into folders, also known as directories. We can list the contents of any directory by using the os.listdir()
method.
The / has a special meaning in most cases: it is the character that divides directory names. So a / by itself means "the root directory of the drive: what I get if I look at C:
on Windows or Macintosh HD
on the Mac."
So, here we have listed the contents of that root directory (which is a lot more than you'll see in your Explorer or Finder window - all the names starting with '.' and even the ones that are all-lowercase are hidden, if you're on a Mac.)
We can also list the contents of any other directory. For example, the Downloads folder on my Mac is in the folder with my short name (tla), which is in a folder called Users, which is right on the Macintosh HD. So I say
/Users/tla/Downloads
to get at it. That is: Starting at the root (/), look in Users; starting at Users (/Users/), look in tla; starting at tla (/Users/tla/), look in Downloads.
If you are on Windows you will probably want something like /Users/Tara Andrews/Downloads
instead.
Let's print out each of the filenames we find in that directory.
In [2]:
for f in os.listdir("/Users/tla/Downloads"):
print(f)
Now open a Finder window, or an Explorer window, and see how the name here corresponds to the structure of the directories on your drive. This is the filesystem.
In Python (or in any program running on your computer for that matter) it always runs from somewhere in this filesystem - some specific directory on your drive. To see where you are running now, you can ask for the current working directory, like so:
In [3]:
print(os.getcwd())
This will look a little different on Mac vs. Windows, because they use different notations (called the path) for their directory structure.
C:\Users\Tara L Andrews\Documents\IPython Notebooks
vs.
/Users/tla/Documents/2014 FS
So you'll need to get used to the convention on whichever sort of computer you have. The / (or \, as the case may be) is called the path separator, separating the chain of directories.
The two important lessons to take from this are:
os.getcwd()
will tell you.Once you know where you are on the filesystem, you can go somewhere else by "changing directory". This is accomplished with the os.chdir()
method. For example:
In [4]:
os.chdir("/Users/tla") # go to the directory tla in the directory Users at the base of the drive
os.chdir("..") # go one directory down
print(os.getcwd()) # see where we are
os.chdir("./tla/Downloads") # go to the directory Downloads in the directory tla that is here where we are
print(os.getcwd()) # see where we are now
There are a few special names when you look at files and directories:
. - the directory I'm standing in
.. - the directory that holds the directory I'm standing in
/ - (all by itself) the base ("root") directory
The / is a little more complicated than that if you are on Windows; really there you should be using \ instead of /, but Python lets you call it / anyway. Moreover, it is the base directory of the drive you are currently looking at (e.g. C:, D:, or whatever.) Macs don't deal with drives in the same way, so if you are on a Mac then you can ignore all this.
So by saying
os.chdir("..")
we have moved down a directory. We could keep doing this all the way to the bottom, if we felt like it. But generally it is a good idea to stay in your home directory if you can.
Wait, what is a home directory?
In [5]:
print(os.getenv("HOME", "not Mac/Unix"))
print(os.getenv("HOMEPATH", "not Windows" )) # alternatively, USERPROFILE
If you're on a Mac, this is the directory that has the little icon of a house in the Finder. It is where all of the files you work with ought to be stored, and normally has your Documents folder and Downloads folder and all of that.
In [6]:
my_home = os.getenv("HOME")
for f in os.listdir( my_home ):
print(f)
Now, if I want to, I can look to see what we have in my Projects folder - this is where I usually keep the code I've worked on. I can do this by joining the directory name ('Projects') to my home directory (which I've saved in my_home
.)
In [17]:
doc_path = os.path.join( my_home, "Projects" )
print( "The join operationdoc_path )
os.listdir( doc_path )
Out[17]:
If we are sure we know where we are, and that the folder we want to see is in the same directory we are in, then we don't have to do the path joining after all.
In [15]:
os.chdir( my_home ) # Change to my home directory
os.listdir( "Dropbox/book" ) # Look at the 'book' folder in the Dropbox folder in my home directory
Out[15]:
In [9]:
fh = open( os.path.join( my_home, "Dropbox/beef_stew.txt" ), 'r' ) # Open the file
contents = fh.read() # Read its contents
fh.close() # Close the file
print(contents) # Do something with the contents
There are three steps to dealing with files from Python:
When you open a file, you make something called a filehandle. The filehandle, well, handles the file - does the reading, writing, closing, etc. that you need to be done.
When you read a file, you have two choices: read in the entire thing, or read it line-by-line. Usually you'll choose to do the latter, so that you can process or analyze each line as you get it. So, for example, we could add line numbers to what is in this file.
Pay attention though: every line in the file ends with a line break. The print
function adds a newline (a technical term for that line break) after everything it prints, by default. Since the lines in the file already have a newline, if we are not careful we will also be double-spacing the file! In order to avoid that, we end the print statement with this parameter
end=""
which says "Instead of the newline you would normally print at the end, print nothing instead."
In [10]:
fh = open( "Dropbox/beef_stew.txt" )
counter = 1
for line in fh:
print("%d: %s" % ( counter, line ), end="")
counter += 1
fh.close()
When we open a file, we can either be reading from it or writing to it (but not both, at least not at this stage.)
When we write to a file that already exists, there are two options: either we will replace whatever was there before, or we will add to it. Let's try it, first to a new file and then adding something to that file.
When we use the open()
function, it takes two parameters: the filename and a letter that indicates whether we want to read or write or what. If we don't give any letter, it assumes we meant 'r' for 'read'. The options are:
r
- open the file to readingw
- empty the file and open it to writinga
- open the file for writing (appending) to the end; do NOT empty it.We can see this in action, by opening the old recipe for reading and a new recipe for writing. Where we use .read()
on the old file, we will use .write()
on the new file.
The write
function is a lot like print
, only it does NOT assume that you want a carriage return at the end. In this case that is pretty convenient, since we don't!
In [11]:
old_recipe = open( "Dropbox/beef_stew.txt" )
new_recipe = open( "Dropbox/numbered_beef_stew.txt", "w" )
counter = 1
for line in old_recipe:
new_recipe.write( "%d: %s" % ( counter, line ) )
counter += 1 # this is the same as counter = counter + 1
old_recipe.close()
new_recipe.close()
So let's look at the new recipe! We open it for reading this time, instead of writing.
In [12]:
new_recipe = open( "Dropbox/numbered_beef_stew.txt" )
contents = new_recipe.read()
new_recipe.close()
print(contents)
Let's say we forgot a step at the end, and want to add it.
When we write a new line to the file, since we are using .write()
and not print()
, we have to make sure to add the carriage return at the end of the line. In Python this can usually be done with the term "\n"
.
Finally, although it is always important to close the files we open, it is especially important if we are writing to the file. If we forget to close a file we're writing to, then it's possible that not everything will get written!
In [13]:
new_recipe = open( "Dropbox/numbered_beef_stew.txt", "a" )
new_recipe.write( "18: give the leftovers to the cat\n" )
new_recipe.close()
And let's see if that worked. This time, instead of using .read()
to put the file into a single variable, we will use .readlines()
to put the file line by line into an array. This is useful if you want to read the file all in one go, but are going to want to do something with its contents line by line.
In [14]:
new_recipe = open( "Dropbox/numbered_beef_stew.txt" )
contents = new_recipe.readlines()
new_recipe.close()
for line in contents:
print("-", line, end='')
You can open and close a file as often as you need to in the same program, as long as you always close it before reopening it somewhere else!
Once you know your way around the files and directories on your system, a lot of things start to make a lot more sense...